Skip to content

feat: tailer rotation, mobile SSE reconnect, hook observability#6

Merged
aksOps merged 4 commits into
mainfrom
feat/observability-and-tailer-rotation
May 1, 2026
Merged

feat: tailer rotation, mobile SSE reconnect, hook observability#6
aksOps merged 4 commits into
mainfrom
feat/observability-and-tailer-rotation

Conversation

@aksOps

@aksOps aksOps commented May 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Three independent fixes plus a gitignore cleanup, all bundled because they were diagnosed and shipped together while debugging stale UI data on mobile.

  • feat(serve) — tailer manager now picks the freshest jsonl per resolved session name (instead of last-iterated), plus a 30s rescan goroutine so claude rotating to a new conversation in an existing tmux session causes the tailer to re-attach automatically. Fixes "session shows 4-day stale activity while claude is actively writing".
  • fix(ui) — drop openWhenHidden:true from the fetch-event-source config. iOS Safari kills the underlying socket when the tab is backgrounded; the library couldn't detect this and never retried. With the default behavior, the stream closes on hidden and reopens on visible, giving a clean reconnect every time the user returns. Server-side Last-Event-ID replay covers the missed window.
  • feat(cmd)ctm log-tool-use (the PostToolUse hook) used to silently return nil on every error path. New warn() helper logs to stderr AND appends a JSON entry to ~/.config/ctm/logs/.hook-errors.log so silent drops leave a forensic trail. Hook contract preserved (still always exits 0).
  • chore — add .codeiq/ to .gitignore (auto-checkpoint had been capturing MBs of code-intel cache blobs).

Diagnosis

Two independent reviewers (one project-conventions-focused, one root-cause-focused via codex) agreed the daemon's hub→engine→enricher pipeline is structurally correct. The remaining real failure modes are:

  1. UUID rotation outside of startup adoption (fixed in feat(serve)).
  2. SSE silently dead on mobile after suspend (fixed in fix(ui)).
  3. Hook silently failing with no signal (fixed in feat(cmd)).

Test plan

  • go build -tags sqlite_fts5 ./... clean
  • go test -tags sqlite_fts5 -short ./internal/serve/... — all packages pass
  • Daemon /proc/PID/fd inspection: codeiq, docsiq, snipIT all moved to current-conversation jsonls after restart
  • CI green

🤖 Generated with Claude Code

aksOps and others added 4 commits May 1, 2026 07:04
When claude rotates to a new conversation inside an existing tmux
session, a fresh UUID jsonl appears in ~/.config/ctm/logs/ but the
tailer manager only ran adoption at startup. Result: daemon stayed
glued to the previous UUID and the UI showed days-stale activity for
sessions that were actively running.

Two changes in internal/serve/server.go:

1. Startup adoption now picks the freshest mtime per resolved session
   name. Previously os.ReadDir order (alphabetical) determined which
   UUID won when multiple log files mapped to the same session — often
   a stale historical conversation.

2. New rescanTailers method runs every 30s on a goroutine. It re-runs
   the freshest-per-name selection against the current logDir and calls
   TailerManager.Start, which is idempotent on (name, uuid) and rotates
   cleanly when the UUID has changed.

Verified by inspecting daemon /proc/PID/fd before/after — codeiq,
docsiq, snipIT all moved from 3-day-old jsonls to current-conversation
files immediately on restart.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fetch-event-source's openWhenHidden:true keeps the connection logically
alive across visibility changes. On iOS Safari (and Chrome on iOS) the
OS suspends the network stack when the tab is backgrounded, killing
the underlying socket without firing onerror in JS. The library thinks
the stream is fine and never retries — the user returns to a tab whose
SSE has been silently dead for hours, showing pre-suspend data.

Drop the flag (default is false). The library now closes on
visibilitychange→hidden and reopens on visible, giving a clean
reconnect every time the user returns. Last-Event-ID replay covers
the missed window server-side.

Pairs with App.tsx's existing refetchOnWindowFocus on react-query
(shipped earlier this session) so on resume both the SSE stream and
the cached query data are fresh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The PostToolUse hook (ctm log-tool-use) is invoked per tool call by
claude. It must always exit 0 to avoid blocking the tool pipeline,
which until now meant every error path was a silent return nil. If
sessionID was empty, the logs dir was missing, the stdin was malformed
JSON, or the file write failed, no signal reached the user — the
daemon would simply observe nothing for that session and the UI would
show 'idle' with no clue why.

Add a warn() helper that surfaces failures via two channels:
- slog.Warn to stderr (claude usually captures hook stderr)
- A one-line JSON entry appended to ~/.config/ctm/logs/.hook-errors.log
  for a forensic trail

Hook contract preserved — every error path still returns nil. Empty
stdin stays silent (legitimately a no-op).

Failure modes covered: stdin read, JSON unmarshal, missing/invalid
session_id (sanitizes to 'unknown' which the daemon won't tail under
any session name), mkdir, marshal, openfile, write.

To diagnose 'session shows idle but is active' reports going forward,
tail .hook-errors.log during the repro.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
.codeiq/ holds a local code-intelligence cache (.mv.db) and a neo4j
graph store rebuilt on demand by the codeiq tool. Without this rule,
the auto-checkpoint hook captured several MB of binary blobs into
'pre-yolo' snapshot commits — clean up the working tree and prevent
future contamination.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@aksOps aksOps merged commit c8e1224 into main May 1, 2026
8 checks passed
@aksOps aksOps deleted the feat/observability-and-tailer-rotation branch May 1, 2026 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant